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our  partiele  swarm  approaeh  eould  eonverge  to  an  ineorreet  strueture.  While  only  an 

initial  proof-of-eoneept,  the  framework  presented  here  could  be  an  important  first  step 

toward  genome-seale  eell-free  kinetie  modeling  of  the  biosynthetic  capacity  of  industrially 

important  organisms. 
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Abstract:  Cell-free  systems  offer  many  advantages  for  the  study,  manipulation  and 

modeling  of  metabolism  eompared  to  in  vivo  proeesses.  Many  of  the  ehallenges  eonfronting 
genome-seale  kinetie  modeling  ean  potentially  be  overeome  in  a  eell-free  system.  Eor 
example,  there  is  no  eomplex  transeriptional  regulation  to  eonsider,  transient  metabolie 
measurements  are  easier  to  obtain,  and  we  no  longer  have  to  eonsider  eell  growth.  Thus, 
eell-free  operation  holds  several  signifieant  advantages  for  model  development,  identifieation 
and  validation.  Theoretieally,  genome-seale  eell-free  kinetie  models  may  be  possible 
for  industrially  important  organisms,  sueh  as  E.  coli,  if  a  simple,  traetable  framework 
for  integrating  allosterie  regulation  with  enzyme  kineties  ean  be  formulated.  Toward 
this  unmet  need,  we  present  an  effeetive  bioehemieal  network  modeling  framework  for 
building  dynamie  eell-free  metabolie  models.  The  key  innovation  of  our  approaeh  is  the 
integration  of  simple  effeetive  rules  eneoding  eomplex  allosterie  regulation  with  traditional 
kinetie  pathway  modeling.  We  tested  our  approaeh  by  modeling  the  time  evolution  of 
several  hypothetieal  eell-free  metabolie  networks.  We  found  that  simple  effeetive  rules, 
when  integrated  with  traditional  enzyme  kinetie  expressions,  eaptured  eomplex  allosterie 
patterns  sueh  as  ultrasensitivity  or  non-eompetitive  inhibition  in  the  absenee  of  meehanistie 
information.  Seeond,  when  integrated  into  network  models,  these  rules  eaptured  elassie 
regulatory  patterns  sueh  as  produet-indueed  feedbaek  inhibition.  Easily,  we  showed,  at 
least  for  the  network  arehiteetures  eonsidered  here,  that  we  eould  simultaneously  estimate 
kinetie  parameters  and  allosterie  eonneetivity  from  synthetie  data  starting  from  an  unbiased 
eolleetion  of  possible  allosterie  struetures  using  partiele  swarm  optimization.  However, 
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when  starting  with  an  initial  population  that  was  heavily  enriehed  with  ineorreet  structures, 
our  particle  swarm  approach  could  converge  to  an  incorrect  structure.  While  only  an 
initial  proof-of-concept,  the  framework  presented  here  could  be  an  important  first  step 
toward  genome-scale  cell-free  kinetic  modeling  of  the  biosynthetic  capacity  of  industrially 
important  organisms. 

Keywords:  allosteric  regulation;  cell-free  metabolism;  heuristic  optimization;  mathematical 
modeling;  parameter  identification;  systems  biology 


1.  Introduction 

Mathematical  modeling  has  long  contributed  to  our  understanding  of  metabolism.  Decades  before 
the  genomics  revolution,  mechanistically,  structured  metabolic  models  arose  from  the  desire  to  predict 
microbial  phenotypes  resulting  from  changes  in  intracellular  or  extracellular  states  [1].  The  single 
cell  E.  coli  models  of  Shuler  and  coworkers  pioneered  the  construction  of  large-scale,  dynamic 
metabolic  models  that  incorporated  multiple,  regulated  catabolic  and  anabolic  pathways  constrained 
by  experimentally  determined  kinetic  parameters  [2].  Shuler  and  coworkers  generated  many  single 
cell  kinetic  models,  including  single  cell  models  of  eukaryotes  [3,4],  minimal  cell  architectures  [5], 
as  well  as  DNA  sequence  based  whole-cell  models  of  E.  coli  [6].  Conversely,  highly  abstracted 
kinetic  frameworks,  such  as  the  cybernetic  framework,  represented  a  paradigm  shift,  viewing  cells  as 
growth-optimizing  strategists  [7] .  Cybernetic  models  have  been  highly  successful  at  predicting  metabolic 
choice  behavior,  e.g.,  diauxie  behavior  [8],  steady-state  multiplicity  [9],  as  well  as  the  cellular  response  to 
metabolic  engineering  modifications  [10].  Unfortunately,  traditional,  fully  structured  cybernetic  models 
also  suffer  from  an  identifiability  challenge,  as  both  the  kinetic  parameters  and  an  abstracted  model  of 
cellular  objectives  must  be  estimated  simultaneously.  However,  recent  cybernetic  formulations  from 
Ramkrishna  and  colleagues  have  successfully  treated  this  identifiability  challenge  through  elementary 
mode  reduction  [11,12]. 

In  the  post  genomics  world,  large-scale  stoichiometric  reconstructions  of  microbial  metabolism 
popularized  by  static,  constraint-based  modeling  techniques  such  as  flux  balance  analysis  (FBA) 
have  become  standard  tools  [13].  Since  the  first  genome-scale  stoichiometric  model  of  E.  coli, 
developed  by  Edwards  and  Palsson  [14],  well  over  100  organisms,  including  industrially  important 
prokaryotes  such  as  E.  coli  [15]  or  B.  subtilis  [16],  are  now  available  [17].  Stoichiometric  models 
rely  on  a  pseudo-steady-state  assumption  to  reduce  unidentifiable  genome-scale  kinetic  models  to 
an  underdetermined  linear  algebraic  system,  which  can  be  solved  efficiently  even  for  large  systems. 
Traditionally,  stoichiometric  models  have  also  neglected  explicit  descriptions  of  metabolic  regulation 
and  control  mechanisms,  instead  opting  to  describe  the  choice  of  pathways  by  prescribing  an  objective 
function  on  metabolism.  Interestingly,  similar  to  early  cybernetic  models,  the  most  common  metabolic 
objective  function  has  been  the  optimization  of  biomass  formation  [18],  although  other  metabolic 
objectives  have  also  been  estimated  [19].  Recent  advances  in  constraint-based  modeling  have  overcome 
the  early  shortcomings  of  the  platform,  including  capturing  metabolic  regulation  and  control  [20]. 
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Thus,  modem  constraint-based  approaches  have  proven  extremely  useful  in  the  discovery  of  metabolic 
engineering  strategies  and  represent  the  state  of  the  art  in  metabolic  modeling  [21,22].  However, 
genome-scale  kinetic  models  of  industrial  important  organisms  such  as  E.  coli  have  yet  to  be  constructed. 

Cell-free  systems  offer  many  advantages  for  the  study,  manipulation  and  modeling  of  metabolism 
compared  to  in  vivo  processes.  Central  amongst  these  advantages  is  direct  access  to  metabolites  and  the 
microbial  biosynthetic  machinery  without  the  interference  of  a  cell  wall.  This  allows  us  to  control  as 
well  as  interrogate  the  chemical  environment  while  the  biosynthetic  machinery  is  operating,  potentially 
at  a  fine  time  resolution.  Second,  cell-free  systems  also  allow  us  to  study  biological  processes  without 
the  complications  associated  with  cell  growth.  Cell-free  protein  synthesis  (CFPS)  systems  are  arguably 
the  most  prominent  examples  of  cell-free  systems  used  today  [23].  However,  CFPS  is  not  new;  CFPS 
in  crude  E.  coli  extracts  has  been  used  since  the  1960s  to  explore  fundamentally  important  biological 
mechanisms  [24,25].  Today,  cell-free  systems  are  used  in  a  variety  of  applications  ranging  from 
therapeutic  protein  production  [26]  to  synthetic  biology  [27].  Interestingly,  many  of  the  challenges 
confronting  genome-scale  kinetic  modeling  can  potentially  be  overcome  in  a  cell-free  system.  For 
example,  there  is  no  complex  transcriptional  regulation  to  consider,  transient  metabolic  measurements 
are  easier  to  obtain,  and  we  no  longer  have  to  consider  cell  growth.  Thus,  cell-free  operation  holds 
several  significant  advantages  for  model  development,  identification  and  validation.  Theoretically, 
genome-scale  cell-free  kinetic  models  may  be  possible  for  industrially  important  organisms,  such  as 
E.  coli  or  B.  subtilis,  if  a  simple,  tractable  framework  for  integrating  allosteric  regulation  with  enzyme 
kinetics  can  be  formulated. 

In  this  study,  we  present  an  effective  biochemical  network  modeling  framework  for  building  dynamic 
cell-free  metabolic  models.  The  key  innovation  of  our  approach  is  the  seamless  integration  of  simple 
effective  rules  encoding  complex  regulation  with  traditional  kinetic  pathway  modeling.  This  integration 
allows  the  description  of  complex  regulatory  interactions,  such  as  time-dependent  allosteric  regulation 
of  enzyme  activity,  in  the  absence  of  specific  mechanistic  information.  The  regulatory  rules  are  easy 
to  understand,  easy  to  formulate  and  do  not  rely  on  overarching  theoretical  abstractions  or  restrictive 
assumptions.  We  tested  our  approach  by  modeling  the  time  evolution  of  several  hypothetical  cell-free 
metabolic  networks.  In  particular,  we  tested  whether  our  effective  modeling  approach  could  describe 
classically  expected  enzyme  kinetic  behavior,  and  second  whether  we  could  simultaneously  estimate 
kinetic  parameters  and  regulatory  connectivity,  in  the  absence  of  specific  mechanistic  knowledge, 
from  synthetic  experimental  data.  Toward  these  questions,  we  explored  five  hypothetical  cell-free 
networks.  Each  network  shared  the  same  enzymatic  connectivity,  but  had  different  allosteric  regulatory 
connectivity.  We  found  that  simple  effective  rules,  when  integrated  with  traditional  enzyme  kinetic 
expressions,  captured  complex  allosteric  patterns  such  as  ultrasensitivity  or  non-competitive  inhibition 
in  the  absence  of  mechanistic  information.  Second,  when  integrated  into  network  models,  these 
rules  captured  classical  regulatory  patterns  such  as  product-induced  feedback  inhibition.  Lastly,  we 
showed,  at  least  for  the  network  architectures  considered  here,  that  we  could  simultaneously  estimate 
kinetic  parameters  and  allosteric  connectivity  from  synthetic  data  starting  from  an  unbiased  collection 
of  possible  allosteric  structures  using  particle  swarm  optimization.  However,  when  starting  with  an 
initial  population  that  was  heavily  enriched  with  incorrect  structures,  our  particle  swarm  approach  could 
converge  to  an  incorrect  structure.  While  only  an  initial  proof-of-concept,  the  framework  presented 


Processes  2015,  3 


141 


here  eould  be  an  important  first  step  toward  genome-seale  cell-free  kinetic  modeling  of  the  biosynthetic 
capacity  of  industrially  important  organisms. 

2.  Results 

2.1.  Formulation  and  Properties  of  Effective  Cell-Free  Metabolic  Models 

We  developed  two  proof-of-concept  metabolic  networks  to  investigate  the  features  of  our  effective 
biochemical  network  modeling  approach  (Figure  1).  In  both  examples,  substrate  S  was  converted  to  the 
end  products  Pi  and  P2  through  a  series  of  enzymatically  catalyzed  reactions,  including  a  branch  point 
at  hypothetical  metabolite  M2.  Several  of  these  reactions  involved  cofactor  dependence  (AH  or  A),  and 
various  allosteric  regulatory  mechanisms  modified  the  activity  of  pathway  enzymes.  Network  A  included 
feedback  inhibition  of  the  initial  pathway  enzyme  {Ei)  by  pathway  end  products  Pi  and  P2  (Figure  lA). 
On  the  other  hand,  network  B  involved  feedback  inhibition  of  Ei  by  P2  and  Eq  by  Pi  (Figure  IB).  In  both 
networks,  branch  point  enzymes  E^  and  Eq  were  subject  to  feed-forward  activation  by  reduced  cofactor 
AH.  Lastly,  it  is  known  experimentally  that  cell-free  systems  have  a  finite  operational  lifespan.  Loss 
of  biosynthetic  capability  could  be  a  function  of  many  factors,  e.g.,  cofactor  or  metabolite  limitations. 
We  modeled  the  loss  of  biosynthetic  capability  as  a  non-specific  first-order  decay  of  enzyme  activity. 


Figure  1.  Proof-of-concept  cell-free  metabolic  networks  considered  in  this  study.  Substrate 
S  is  converted  to  products  Pi  and  P2  through  a  series  of  chemical  conversions  catalyzed  by 
enzyme(s)  Ej.  The  activity  of  the  pathway  enzymes  is  subject  to  both  positive  and  negative 
allosteric  regulation. 

Allosteric  regulation  of  enzyme  activity  was  modeled  by  combining  individual  regulatory 
contributions  to  the  activity  of  pathway  enzymes  into  a  control  coefficient  using  an  integration  rule 
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(Figure  2).  This  strategy  is  similar  in  spirit  to  the  Constrained  Fuzzy  Logie  (cFL)  approaeh  of 
Lauffenburger  and  eoworkers  whieh  has  been  used  to  effeetively  model  signal  transduetion  pathways 
important  in  human  health  [28].  In  our  formulation,  Hill-like  transfer  funetions  0  <  /  (Z)  <  1  were 
used  to  ealeulate  the  infiuenee  of  faetor  abundanee  upon  target  enzyme  aetivity.  In  this  eontext,  faetors 
ean  be  individual  metabolite  levels  or  some  funetion,  e.g.,  the  produet  of  metabolite  levels.  However, 
more  generally,  faetors  ean  also  eorrespond  to  non-modeled  influenees,  eategorial  variables  or  other 
abstraet  quantities.  In  the  current  study,  we  simply  let  Z  correspond  to  the  abundance  of  individual 
metabolites,  however  in  general  this  can  be  a  complex  function  of  both  modeled  and  unmodeled  factors. 
When  an  enzyme  was  potentially  sensitive  to  more  than  one  regulatory  input,  logical  integration  rules 
were  used  to  select  which  regulatory  transfer  function  influenced  enzyme  activity  at  any  given  time. 
Thus,  our  test  networks  involved  important  features  such  as  cofactor  recycling,  enzyme  activity  and 
metabolite  dynamics,  as  well  as  multiple  overlapping  allosteric  regulatory  mechanisms. 
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Figure  2.  Schematic  of  rule-based  allosteric  enzyme  activity  control  laws.  Traditional 
enzyme  kinetic  expressions,  e.g.,  Michaelis-Menten  or  multiple  saturation  kinetics,  are 
multiplied  by  an  enzyme  activity  control  variable  0  <  uj  <  1.  Control  variables  are 
functions  of  many  possible  regulatory  factors  encoded  by  arbitrary  functions  of  the  form 
0  <  fj  (Z)  <  1.  At  each  simulation  time  step,  the  vj  variables  are  calculated  by  evaluating 
integration  rules  such  as  the  max  or  min  of  the  set  of  factors  /i , . . .  influencing  the  activity 
of  enzyme  Ej. 


The  rule-based  regulatory  strategy  approximated  the  behavior  of  classical  allosteric  activation  and 
inhibition  mechanisms  (Figure  3).  We  considered  the  enzyme  catalyzed  conversion  of  substrate  S 
to  a  product  P,  where  the  overall  reaction  rate  was  modeled  as  the  product  of  a  Michaelis-Menten 
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term  and  an  effective  allosteric  control  variable  reflecting  the  particular  regulatory  interaction.  We 
first  explored  feed-forward  substrate  activation  of  enzyme  activity  (for  both  positive  and  negative 
cooperativity).  Consistent  with  classical  data,  the  rule-based  strategy  predicted  a  sigmoidal  relationship 
between  substrate  abundance  and  reaction  rate  as  a  function  of  the  cooperativity  parameter  (Figure  3A). 
For  cooperativity  parameters  less  than  unity,  increased  substrate  abundance  decreased  the  maximum 
reaction  rate.  This  was  consistent  with  the  idea  that  substrate  binding  decreased  at  regulatory  sites,  which 
negatively  impacted  substrate  binding  at  the  active  site.  On  the  other  hand,  as  the  cooperativity  parameter 
increased  past  unity,  the  rate  of  conversion  of  substrate  S  to  product  P  by  enzyme  E  approached  a 
step  function.  In  the  presence  of  an  inhibitor,  the  rule-based  strategy  predicted  non-competitive  like 
behavior  as  a  function  of  the  cooperativity  parameter  (Figure  3B).  When  the  control  gain  parameter,  Kij 
in  Equaion  (10),  was  greater  than  unity,  the  inhibitory  force  was  directly  proportional  to  the  cooperativity 
parameter,  r]  in  Equation  (10).  Thus,  as  the  cooperativity  parameter  increased,  the  maximum  reaction 
rate  decreased  (Eigure  3B).  Interestingly,  our  rule-based  approach  was  unable  to  directly  simulate 
competitive  inhibition  of  enzyme  activity.  Taken  together,  the  rule-based  strategy  captured  classical 
regulatory  patterns  for  both  enzyme  activation  and  inhibition.  Thus,  we  are  able  to  model  complex 
kinetic  phenomena  such  as  ultrasensitivity,  despite  an  effective  description  of  reaction  kinetics. 

Inhibitor  =  10  A.U 


A  B 


Figure  3.  Kinetics  of  simple  transformations  in  the  presence  of  activation  and  inhibition. 
(A)  The  conversion  of  substrate  S  to  product  P  by  enzyme  E  was  activated  by  S. 
Eor  a  fixed  control  gain  parameter  k control,  the  reaction  rate  approached  a  step  for 
increasing  cooperativity  control  parameter  r].  Eor  activation  simulations  ^control  =  0-05 
and  7]  =  {0.01,0.1,1,2,4,6,8,10};  (B)  The  conversion  of  substrate  S  to  product  P  by 
enzyme  E  with  inhibitor  /.  Eor  a  fixed  control  gain  parameter  ^control,  the  reaction  rate 
approximated  non-competitive  inhibition  for  increasing  cooperativity  control  parameter  rj. 
Eor  the  inhibition  simulations  Kcontroi  =  1-5  and  r]  =  (0.01, 0.1, 1,  2, 4, 6,  8, 10}. 
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End  product  yield  was  controlled  by  feedback  inhibition,  while  product  selectivity  was  controlled  by 
branch  point  enzyme  inhibition  (Figure  4).  A  critical  test  of  our  modeling  approach  was  to  simulate 
networks  with  known  behavior.  If  we  cannot  reproduce  the  expected  behavior  of  simple  networks,  then 
our  effective  modeling  strategy,  and  particularly  the  rule-based  approximation  of  allosteric  regulation, 
will  not  be  feasible  for  genome-scale  cell-free  problems.  We  considered  two  cases,  control  ON/OFF,  for 
each  network  configuration.  Each  of  these  cases  had  identical  kinetic  parameters  and  initial  conditions; 
the  only  differences  between  the  cases  were  the  allosteric  regulation  rules  and  the  control  parameters 
associated  with  these  rules.  As  expected,  end  product  accumulation  was  larger  for  network  A  when  the 
control  was  OFF  (no  feedback  inhibition  of  Ei  by  Pi  and  P2),  as  compared  to  the  ON  case  (Figure  4A). 
We  found  this  behavior  was  robust  to  the  choice  of  underlying  kinetic  parameters,  as  we  observed 
that  same  qualitative  response  across  an  ensemble  of  100  randomized  parameter  sets,  for  fixed  control 
parameters.  The  control  ON/OFF  response  of  network  B  was  more  subtle.  In  the  OFF  case,  the  behavior 
was  qualitatively  similar  to  network  A.  However,  for  the  ON  case,  flux  was  diverted  away  from  P2 
formation  by  feedback  inhibition  of  Eq  activity  at  the  M2  branch  point  by  Pi  (Figure  4B).  Fower  Eq 
activity  at  the  M2  branch  point  allowed  more  flux  toward  Pi  formation,  hence  the  yield  of  Pi  also 
increased  (Figure  4C).  Again,  the  control  ON/OFF  behavior  of  network  B  was  robust  to  changes  in 
kinetic  parameters,  as  the  same  qualitative  trend  was  conserved  across  an  ensemble  of  100  randomized 
parameters,  for  fixed  control  parameters.  Taken  together,  these  simulations  suggested  that  the  rule-based 
allosteric  control  concept  could  robustly  capture  expected  feedback  behavior  for  networks  with  uncertain 
kinetic  parameters. 

2.2.  Estimating  Parameters  and  Effective  Allosteric  Regulatory  Structures 

A  critical  challenge  for  any  dynamic  model  is  the  estimation  of  kinetic  parameters.  For  metabolic 
processes,  there  is  also  the  added  challenge  of  identifying  the  regulation  and  control  structures  that 
manage  metabolism.  Of  course,  these  issues  are  not  independent;  any  description  of  enzyme  activity 
regulation  will  be  a  function  of  system  state,  which  in  turn  depends  upon  the  kinetic  parameters. 
For  cell-free  systems,  regulated  gene  expression  has  been  removed,  however,  enzyme  activity  regulation 
is  still  operational.  We  explored  this  linkage  by  estimating  model  parameters  from  synthetic  data  using 
both  network  structures.  We  generated  synthetic  measurements  of  the  substrate  S,  intermediate  M5 
and  end  product  Pi  approximately  every  20  min  using  network  A.  This  data  set  is  similar  to  published 
cell-free  studies,  both  in  terms  of  network  coverage  and  sampling  frequency  [23].  We  then  generated  an 
ensemble  of  model  parameter  estimates  by  minimizing  the  difference  between  model  simulations  and  the 
synthetic  data  using  particle  swarm  optimization  (PSO),  starting  from  random  initial  parameter  guesses. 
The  estimation  of  kinetic  parameters  was  sensitive  to  the  choice  of  regulatory  structure  (Figure  5). 
PSO  identified  an  ensemble  of  parameters  that  bracketed  the  mean  of  the  synthetic  measurements 
in  less  than  1000  iterations  when  the  control  structure  was  correct  (Figure  5A,B).  However,  with 
control  mismatch  (network  B  simulated  with  network  A  parameters),  model  simulations  were  not 
consistent  with  the  synthetic  data  (Figure  5C,D).  Taken  together,  these  results  suggested  that  we  could 
perhaps  simultaneously  estimate  both  parameters  and  network  control  architectures,  as  incorrect  control 
structures  would  be  manifest  as  poor  model  fits. 
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Time  [min] 


Figure  4.  ON/OFF  control  simulations  for  Network  A  and  Network  B  for  an  ensemble 
of  100  kinetie  parameter  sets  versus  time.  For  eaeh  ease,  simulations  were  eonducted  using 
kinetie  and  initial  eonditions  generated  randomly  from  a  hypothetical  true  parameter  set.  The 
gray  area  represents  ±  one  standard  deviation  surrounding  the  mean.  Control  parameters 
were  fixed  during  the  ensemble  ealculations.  (A)  End  product  Pi  abundance  versus  time 
for  Network  A.  The  abundanee  of  Pi  deereased  with  end  product  inhibition  of  Ei  activity 
(Control-ON)  versus  the  no  inhibition  ease  (Control- OFF);  (B)  End  produet  P2  abundanee 
versus  time  for  Network  B.  Inhibition  of  branch  point  Eq  by  end  product  Pi  decreased  P2 
abundanee  (Control-ON)  versus  the  no  inhibition  case  (Control-OEE);  (C)  End  produet  Pi 
abundanee  versus  time  for  Network  A.  Inhibition  of  branch  point  Eq  by  end  produet  Pi 
deereased  Pi  abundanee  (Control-ON)  versus  the  no  inhibition  ease  (Control-OEE). 
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Control  Policy  A  (simulation)  =  Control  Policy  A  (experiment)  Control  Policy  B  (simulation)  =  Control  Policy  A  (experiment) 


Time  [min]  Time  [min] 


Figure  5.  Parameter  estimation  from  synthetie  data  for  the  same  and  mismatehed  allosterie 
eontrol  logie  using  partiele  swarm  optimization  (PSO).  Synthetie  experimental  data  was 
generated  from  a  hypothetical  parameter  set  using  Network  A,  where  substrate  S,  end 
product  Pi  and  intermediate  M5  were  sampled  approximately  every  20  min.  For  cases 
(A,B)  20  particles  were  initialized  with  randomized  parameters  and  allowed  to  search  for 
300  iterations.  (A,B)  PSO  estimated  an  ensemble  of  20  parameters  sets  consistent  with  the 
synthetic  experimental  data  assuming  the  correct  enzymatic  and  control  connectivity  starting 
from  randomized  initial  parameters;  (C,D)  In  the  presence  of  control  mismatch  (Network  B 
control  policy  simulated  with  Network  A  kinetic  parameters)  the  ensemble  of  models  did 
not  describe  the  synthetic  data.  The  synthetic  data  plotted  here  was  unperturbed  by  noise. 
However,  we  assumed  a  constant  coefficient  of  variation  of  10%  for  the  synthetic  data  during 
parameter  estimation. 

We  modified  our  particle  swarm  identification  strategy  to  simultaneously  search  over  both  kinetic 
parameters  and  putative  control  structures.  In  addition  to  our  initial  networks,  we  constructed  three 
additional  presumptive  network  models,  each  with  the  same  enzymatic  connectivity  but  different 
allosteric  regulation  of  the  pathway  enzymes  (Figure  6).  We  then  initialized  a  population  of  particles, 
each  with  one  of  the  five  potential  regulatory  programs  and  randomized  kinetic  parameters.  Thus, 
we  generated  an  initial  population  of  particles  that  had  both  different  kinetic  parameters  as  well  as 
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different  control  structures.  We  biased  the  distribution  of  the  particle  population  according  to  our  a  prior 
belief  of  the  correct  regulatory  program.  To  this  end,  we  considered  three  different  priors,  a  uniform 
distribution  where  each  putative  regulatory  structure  represented  20%  of  the  population  and  two  mixed 
distributions  that  were  either  positively  or  negatively  biased  towards  the  correct  structure  (network  A). 
In  both  the  positively  biased  and  uniform  cases  the  PSO  clearly  differentiated  between  the  true  or  closely 
related  structures  and  those  that  were  materially  different  (Figure  7).  As  expected,  the  positively  biased 
population  (40%  of  the  initial  particle  population  seeded  with  network  A)  gave  the  best  results,  where 
the  correct  structure  was  preferentially  identified  (Figure  7A).  On  the  other  hand,  when  given  a  uniform 
distribution,  the  PSO  approach  identified  a  combination  of  network  A  and  network  C  as  the  most  likely 
control  structures  (Figure  7B).  Network  A  and  C  differ  by  the  regulatory  connection  between  the  end 
product  P2  and  enzyme  Ep,  in  network  A,  end  product  P2  was  assumed  to  inhibit  Ei,  while  in  network  C, 
end  product  P2  activated  Ei.  Lastly,  when  the  initial  population  was  heavily  biased  towards  incorrect 
structures  (initial  population  seeded  with  90%  incorrect  structures),  the  particle  swarm  misidentified  the 
correct  allosteric  structure  (Figure  7C).  Interestingly,  while  each  particle  swarm  identified  parameter 
sets  that  minimized  the  simulation  error,  the  estimated  parameter  values  were  not  necessarily  similar  to 
the  true  parameters.  The  angle  between  the  estimated  and  true  parameters  was  not  consistently  small 
across  the  swarms  (identical  parameters  would  give  an  angle  of  zero).  This  suggested  that  our  particle 
swarm  approach  identified  a  sloppy  ensemble,  i.e.,  parameter  estimates  that  were  individually  incorrect 
but  collectively  exhibited  the  correct  model  behavior. 


Figure  6.  Schematic  of  the  alternative  allosteric  control  programs  used  in  the  structural 
particle  swarm  computation.  Each  network  had  the  same  enzymatic  connectivity,  initial 
conditions  and  kinetic  parameters,  but  alternative  feedback  control  structures  for  the  first 
enzyme  in  the  pathway. 
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Figure  7.  Combined  control  and  kinetic  parameter  search  using  modified  particle  swarm 
optimization  (PSO).  A  population  of  100  particles  was  initialized  with  randomized  kinetic 
parameters  and  one  of  five  possible  control  configurations  (Network  A-E).  Simulation 
error  was  minimized  for  a  synthetic  data  set  (S',  end  product  Pi  and  intermediate  M5 
sampled  approximately  every  20  min)  generated  using  Network  A.  (A)  Simulation  error 
versus  parameter  set  angle  for  100  particles  biased  toward  the  correct  regulatory  program 
(A,B,C,D,E)  =  (40%,  10%,  20%,  20%  and  10%);  (B)  Simulation  error  versus  parameter 
set  angle  for  100  uniformly  distributed  particles  (A,B,C,D,E)  =  (20%,  20%,  20%,  20%  and 
20%);  (C)  Simulation  error  versus  parameter  set  angle  for  100  negatively  biased  particles 
(A,B,C,D,E)  =  (10%,  40%,  10%,  20%  and  20%).  Network  A  (the  correct  structure) 
was  preferentially  identified  for  positively  and  uniform  biased  particle  distributions,  but 
misidentified  in  the  presence  of  a  large  incorrect  bias. 
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Figure  8.  Metabolic  flux  and  control  variables  as  a  function  of  network  type  and  particle 
index  at  t  =  100  min.  The  particle  error,  the  control  variables  governing  Ei,  and  Eq 
activity  (^1,^3  and  V3)  and  the  scaled  metabolic  flux  were  calculated  for  the  positively 
(top),  uniformly  (middle)  and  negatively  (bottom)  biased  particle  swarms  (N  =  100).  Blue 
denotes  a  low  value,  while  red  denotes  a  high  value  for  the  respective  quantity  being  plotted. 
The  particles  from  each  swarm  were  sorted  based  upon  simulation  error  (low  to  high  error). 
(A)  Model  performance  for  the  positively  biased  particle  swarm  as  a  function  of  particle 
index;  (B)  Model  performance  for  the  uniformly  biased  particle  swarm  as  a  function  of 
particle  index;  (C)  Model  performance  for  the  negatively  biased  particle  swarm  as  a  function 
of  particle  index.  Models  with  significant  control  mismatch  showed  distinct  control  and  flux 
patterns  versus  those  models  with  the  correct  or  closely  related  control  policies.  In  particular, 
models  with  the  correct  control  policy  showed  stronger  inhibition  of  Ei  activity,  leading  to 
decreased  flux  from  S— t-Pi.  Conversely,  models  with  significant  mismatch  had  increased  Ei 
activity,  leading  to  an  altered  flux  distribution.  This  is  especially  apparent  in  the  negatively 
biased  particle  swarm. 


We  calculated  control  program  output  and  scaled  metabolic  flux  for  the  positively,  uniformly  and 
negatively  biased  particle  swarms  (Figure  8).  Network  A  and  network  C  models  from  the  positively 
(Figure  8 A)  and  uniformly  (Figure  8B)  biased  particle  swarms  showed  similar  operational  patterns, 
despite  differences  in  kinetic  parameters  and  control  structures.  While  models  from  the  negatively 
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biased  population  had  error  values  similar  to  the  eorreet  struetures  in  the  previous  swarms,  they  have 
different  flux  and  eontrol  profiles  (Figure  8C).  In  all  eases,  regardless  of  network  eonfiguration  or 
parameter  values,  the  rate  of  enzyme  deeay  was  small  eompared  to  the  other  fluxes,  and  all  networks 
had  qualitatively  similar  trends  for  E^,  and  Eq  eontrol.  Moreover,  eonsistent  with  the  eorreet  model 
strueture,  produetion  of  end  produet  Pi  was  the  preferred  braneh  for  all  model  eonfigurations.  However, 
there  was  variability  in  P2  produetion  flux  aeross  the  population  of  models,  espeeially  for  the  uniform 
swarm  when  eompared  with  the  other  eases.  High  Pi  braneh  flux  resulted  in  end  produet  inhibition  of 
El  in  both  network  A  and  network  C,  however  in  network  D  and  E,  high  Pi  flux  indueed  Ei  aetivation. 
These  trends  were  manifested  in  different  flux  profiles,  where  the  negatively  biased  population  appeared 
more  uniform  aeross  the  population  eompared  with  the  other  swarms,  and  had  higher  Ei  speeifie  aetivity. 
Interestingly,  the  behavior  of  network  A  and  network  C  highlighted  an  artifaet  of  our  integration  rule; 
both  a  positive  or  negative  feedbaek  eonneetion  from  P2  to  Ei  were  ignored  beeause  the  Pi  inhibition  of 
El  dominated.  Thus,  while  theoretieally  distinet,  network  A  and  network  C  appeared  operationally  to  the 
PSO  algorithm  to  be  the  same  network.  On  the  other  hand,  networks  B,  D  and  E  showed  distinet  behavior 
that  was  not  eonsistent  with  the  true  network.  These  arehiteetures  exhibited  either  limited  inhibition 
(network  B)  or  aetivation  (network  D  and  E)  of  Ei  aetivity,  resulting  in  signifieantly  different  metabolie 
flux  profiles.  However,  the  PSO  was  able  to  find  low  error  parameter  solutions,  despite  the  mismateh 
in  the  eontrol  struetures  (error  values  similar,  but  not  better  than  the  best  network  A  and  network  C 
estimates).  Taken  together,  these  results  suggested  that  a  uniform  sampling  approaeh  eould  potentially 
yield  an  unbiassed  estimate  of  both  kinetie  parameters  and  eontrol  struetures.  However,  the  negatively 
biased  partiele  swarm  results  illustrated  a  potential  shorteoming  of  the  approaeh,  namely  eonvergenee  to 
a  loeal  error  minimum  despite  a  signifieantly  ineorreet  eontrol  strueture.  This  suggested  that  estimated 
model  struetures  will  need  to  be  further  evaluated,  for  example  by  generating  falsifiable  experimental 
designs  whieh  eould  distinguish  between  low  error  solutions. 

3.  Discussion 

In  this  study,  we  presented  an  effeetive  kinetie  modeling  strategy  to  dynamieally  simulate  eell-free 
bioehemieal  networks.  Our  proposed  strategy  integrated  traditional  kinetie  modeling  with  an  effeetive 
rules  based  approaeh  to  dynamieally  deseribe  metabolie  regulation  and  eontrol.  We  tested  this  approaeh 
by  developing  kinetie  models  of  hypothetieal  eell-free  metabolie  networks.  In  partieular,  we  tested 
whether  our  effeetive  modeling  approaeh  eould  deseribe  elassieally  expeeted  behavior,  and  seeond 
whether  we  eould  simultaneously  estimate  kinetie  parameters  and  regulatory  eonneetivity,  in  the  absenee 
of  speeifie  meehanistie  knowledge,  from  synthetie  experimental  data.  Toward  these  questions,  we 
explored  five  hypothetieal  eell-free  networks.  In  eaeh  network,  a  substrate  S  was  eonverted  to  the 
end  produets  Pi  and  P2  through  a  series  of  enzymatieally  eatalyzed  reaetions,  ineluding  a  braneh 
point  at  a  hypothetieal  metabolite  M2.  Eaeh  network  also  ineluded  the  same  eofaetors  and  eofaetor 
reeyele  arehiteeture.  However,  while  all  five  networks  shared  the  same  enzymatie  eonneetivity, 
eaeh  had  different  allosterie  regulatory  eonneetivity.  We  found  that  simple  effeetive  rules,  when 
integrated  with  traditional  enzyme  kinetie  expressions,  eould  eapture  eomplex  allosterie  patterns  sueh 
as  ultrasensitivity,  or  non-eompetitive  inhibition  in  the  absenee  of  speeifie  meehanistie  information. 
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Moreover,  when  integrated  into  network  models,  these  rules  eaptured  elassieal  regulatory  patterns 
sueh  as  produet-indueed  feedbaek  inhibition.  Lastly,  we  simultaneously  estimated  kinetie  parameters 
and  diseriminated  between  eompeting  regulatory  struetures,  using  synthetie  data  in  eombination  with 
a  modified  partiele  swarm  approaeh.  If  we  eonsidered  all  putative  regulatory  arehiteetures  to  be 
equally  likely,  we  were  able  to  estimate  a  sloppy  ensemble  of  models  with  the  eorreet  arehiteeture 
and  kinetie  parameters.  Thus,  we  identified  parameter  values  that  were  different  from  their  true 
values,  but  nonetheless  produeed  reasonable  model  performanee  (low  error).  This  suggested  that  we 
eaptured  important  parameter  eombinations  (stiff  eombinations),  while  simultaneously  missing  other 
parameter  eombinations  (sloppy  eombinations).  This  was  similar  to  the  earlier  study  of  Brown  and 
Sethna  [29],  whieh  showed  that  reasonable  model  predietions  were  possible,  despite  sometimes  only 
order  of  magnitude  parameter  estimates,  if  the  stiff  parameter  eombinations  were  well  eonstrained. 

The  proposed  modeling  strategy  shares  features  with  other  popular  teehniques,  but  also  has  several 
key  differenees.  At  its  eore,  our  effeetive  modeling  approaeh  is  similar  to  regulatory  eonstraint-based 
methods,  and  to  the  eybernetie  modeling  paradigm  developed  by  Ramkrishna  and  eolleagues.  Covert, 
Palsson  and  eoworkers  drastieally  improved  the  predietability  of  eonstraint-based  approaehes  by 
integrating  Boolean  rules  into  the  ealeulation  of  metabolie  fluxes  [30].  If  the  regulated  intraeellular 
flux  problem  is  eoupled  with  time-dependent  extraeellular  balanees,  these  models  ean  prediet  eomplex 
behavior  sueh  as  diauxie  growth  or  the  switeh  between  aerobie  and  anaerobie  metabolism.  Another 
important  feature  of  this  approaeh  is  that  it  seales  with  biologieal  eomplexity.  For  example.  Covert  et  al. 
showed  that  a  genome-seale  model  of  E.  coli  augmented  with  a  Boolean  rule  layer,  eorreetly  predieted 
approximately  80%  of  the  outeomes  of  a  high-throughput  growth  phenotyping  experiment  in  E.  coli. 
Further,  they  showed  that  they  eould  learn  new  biology  by  iteratively  refining  the  model  and  its  assoeiated 
rules  [31].  However,  while  regulated  flux  balanee  analysis  is  a  powerful  teehnique,  it  does  not  easily 
allow  the  ealeulation  of  time-resolved  metabolite  abundanee.  Additionally,  the  Boolean  rules  whieh 
populate  the  regulatory  layer  are  limited  to  ON/OFF  deeisions;  for  qualitative  predietions  of  gene 
expression  this  is  a  reasonable  limitation.  However,  Boolean  rules  will  likely  be  less  effeetive  at 
eapturing  dynamie  allosterie  regulation  in  a  eell-free  metabolie  system.  On  the  other  hand,  the  strength 
of  eybernetie  models  is  the  integration  of  optimal  metabolie  eontrol  strategies  with  traditional  kinetic 
pathway  modeling.  Cybernetic  models  are  highly  predictive;  they  have  successfully  predicted  mutant 
behavior  from  limited  wild-type  data  [10,32,33],  steady-state  multiplicity  [9],  strain  specific  metabolic 
function  [12]  and  have  been  used  in  bioprocess  control  applications  [34].  However,  cybernetic  control 
strategies  are  not  mechanistic,  instead  they  are  the  output  of  an  optimal  decision  with  respect  to  a  set 
of  hypothetical  physiological  objectives.  Thus,  they  are  abstractions  which  are  difficult  to  translate 
into  a  specific  biological  mechanism.  Our  approach  addresses  the  shortcomings  of  both  regulatory 
constraint-based  models  and  cybernetic  models.  First,  similar  to  cybernetic  models,  the  core  of  our 
approach  is  a  kinetic  model.  Thus,  we  are  able  to  directly  calculate  the  time  evolution  of  metabolism, 
for  example  the  dynamic  abundance  of  network  metabolites.  Second,  similar  to  regulatory  flux  balance 
analysis,  our  control  laws  describe  specific  mechanistic  motifs,  such  as  activation  or  inhibition  of  enzyme 
activity.  However,  our  rules  are  continuous,  thus  they  potentially  allow  a  finer  grained  description  of 
metabolic  regulation  and  control  mechanisms.  Lastly,  we  can  naturally  incorporate  unmodeled  factors 
and  categorical  factors  or  combinations  thereof  into  our  control  law  formulations.  Though  requiring 
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a  more  complex  description  of  cellular  metabolism,  our  approach  may  even  be  extended  to  simulate 
cell-based  systems  by  incorporating  the  same  control  laws  into  transcription  factor  activation  and  gene 
expression  regulation. 

The  proposed  modeling  framework  also  differs  appreciably  from  previously  established  kinetic 
approximations  of  complex  biochemical  network  behavior.  Such  frameworks  replace  parameter  dense 
mechanistic  kinetic  expressions  with  heuristics  quantifying  the  relationship  between  metabolic  rate  and 
metabolite  effectors.  A  review  of  approximative  kinetic  formats  can  be  found  in  [35].  These  approaches 
arose  in  response  to  uncertainties  associated  with  obtaining  correct  mechanistic  kinetic  expressions  and 
parameters  of  in  vivo  systems.  Similarly,  available  kinetic  parameters  measured  in  vitro  may  differ  in 
a  specialized  cell- free  in  vitro  environments.  Factors  affecting  kinetics,  such  as  enzyme  channeling, 
macromolecular  crowding,  and  pH,  are  likely  dramatically  different  in  cell-free  environments  than  in 
both  in  vivo  systems  as  well  as  typical  in  vitro  conditions  used  for  parameter  measurements.  Thus,  a 
more  generalized,  approximate  biochemical  reaction  network  formulation  may  be  desirable  in  the  case 
of  cell-free  systems.  Our  approach  is  similar  to  generalized  mass  action-based  power  law  formulations 
of  Savageau  and  colleagues  [36]  and  linlog  kinetics  of  Visser  et  al.  [37]  in  that  metabolic  rates  are 
proportional  to  corresponding  enzyme  levels  modified  by  metabolite  effectors.  Power  law  and  linlog 
approaches  suffer  from  several  limitations  [35].  Power  law  reaction  rates  do  not  capture  saturation 
effects  and  become  infinite  for  small  concentrations  of  inhibitory  regulators.  Linlog  kinetics  also  become 
ill-defined  when  effector  concentrations  go  to  zero.  Also,  models  employing  linlog  kinetics  typically  rely 
on  an  experimentally  determined  reference  state  to  describe  dynamics  taking  place  after  a  perturbation 
to  a  steady-state.  Our  framework  does  not  suffer  from  such  drawbacks.  Moreover,  cell-free  systems 
are  unlikely  to  satisfy  such  a  steady-state  approximation  after  extract  preparation  and  prior  to  culture 
initiation.  Our  framework  is  similar  to  the  generic  kinetic  formulation  from  Hadlich  et  al.  [38] ,  but  differs 
in  its  inclusion  of  cooperative  effects  as  well  as  proposes  a  simplified  integration  of  competition  amongst 
allosteric  effectors  using  max/min  rules.  In  summary,  our  proposed  framework  offers  an  effective  kinetic 
approximation  that  captures  saturation  effects  and  allosteric  competition  within  cell-free  systems  that 
may  also  be  extensible  to  in  vivo  metabolic  and  gene  regulatory  networks. 

There  are  several  critical  questions  that  should  be  explored  following  this  proof-of-concept  study. 
It  is  unclear  how  parameter  identification  will  scale  to  genome-scale  networks,  and  second  it  is 
unclear  how  we  will  identify  allosteric  connectivity  at  a  genome-scale.  The  enzymatic  connectivity 
for  genome-scale  cell-free  networks  can  easily  be  established  by  stripping  away  the  growth  and  cell 
wall  machinery  from  whole  cell  genome  reconstructions.  Then  metabolic  fluxes  can  be  transformed 
into  kinetic  expressions  using  heuristics  such  multiple  saturation  kinetics,  which  are  then  modified  by 
our  rule -based  control  variables.  This  leaves  a  large  number  of  unknown  kinetic  constants  that  must 
be  estimated  from  time-resolved  metabolite  measurements.  Ensemble  modeling  is  a  well-established 
approach  for  parameter  identification  in  large-scale  deterministic  models.  Liao  and  coworkers  developed 
a  method  that  generates  an  ensemble  of  kinetic  models  that  all  approach  the  same  steady-state, 
one  determined  by  fluxomics  measurements  [39].  The  best  subpopulation  of  candidate  models 
were  selected  based  on  their  agreement  with  further  measurements  of  genetically  perturbed  systems. 
Our  work  relies  on  heuristic  search  optimization  to  identify  kinetic  models  consistent  with  steady-state 
and  dynamic  time-series  measurements  of  cellular  species  [40-45].  Instead  of  estimating  a  single  yet 
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highly  uncertain  parameter  set,  both  approaches  estimate  an  ensemble  of  parameter  sets  whose  model 
behavior  reeapitulates  experimental  measurements.  Here,  we  showed  that  partiele  swarm  optimization 
quiekly  identified  an  ensemble  of  model  parameters,  at  least  for  proof-of-eoneept  metabolic  networks 
using  synthetic  data.  This  suggested  that  we  can  expect  reasonable  model  predietions,  despite  only  partial 
parameter  knowledge,  as  network  size  grows  if  we  have  properly  designed  experiments.  Though  we 
expeet  computational  complexity  will  scale  poorly  with  network  size,  we  are  optimistic  that  large-scale, 
predietive  models  of  metabolism  are  possible.  There  is  evidenee  to  suggest  that  achieving  a  quantitative 
understanding  of  eomplex  biologieal  systems  should  not  require  complete  parametric  knowledge.  Brown 
and  Sethna  showed  in  a  model  of  signal  transduetion  that  good  predictions  were  possible  despite  only 
order  of  magnitude  estimates  of  parameter  values  [29].  Sethna  and  eoworkers  later  showed  that  model 
performance  is  often  eontrolled  by  only  a  few  parameter  combinations,  a  characteristic  seemingly 
universal  to  multi-parameter  models  referred  to  as  sloppiness  [46].  We  have  also  demonstrated  sloppy 
behavior  in  a  wide  variety  of  signal  transduetion  processes  [40^5].  Thus,  given  our  previous  experienee 
with  models  eontaining  hundreds  of  unknown  parameters,  we  expeet  parameter  estimation  to  be  a 
manageable  ehallenge  assuming  we  have  good  quality  experimental  data. 

A  seeond  eritieal  ehallenge  will  be  the  estimation  of  allosterie  eonneetivity  at  a  genome  seale. 
The  regulation  of  glyeolytie  enzymes,  such  as  phosphofructokinase  I,  has  been  studied  for  many 
years  [47,48].  The  allosteric  regulation  of  metabolie  enzymes  can  also  be  established  from  organism 
speeifie  databases,  sueh  as  EeoCye  [49],  or  more  general  allosterie  databases,  sueh  as  the  AlloSterie 
Database  [50].  However,  for  those  enzymes  that  have  not  been  well  studied,  we  will  need  to  infer 
allosterie  interaetions  from  experimental  data.  In  general,  the  reverse  engineering  of  regulatory  network 
strueture  from  data  is  a  diffieult  problem.  Reeently,  Sauer  and  eolleagues  have  developed  a  systematie, 
model-based  approach  for  the  identification  of  allosteric  regulation  in  vivo  [51].  They  tested  the 
effects  of  many  putative  allosteric  protein-metabolite  interactions  on  the  performanee  of  a  kinetic  model 
of  glyeolysis  against  dynamic  metabolomic  and  fluxomic  measurements.  A  method  similar  to  this 
may  be  easily  applied  to  eell-free  systems  in  order  to  identify  relevant  in  vitro  allosteric  interactions. 
Because  omics  measurements  of  cell-free  environments  are  easy  to  obtain,  identification  of  large-scale 
allosterie  eontrol  struetures  may  be  possible.  Also,  there  are  many  different  approaehes  from  the 
reverse  engineering  of  gene  regulatory  networks  that  perhaps  eould  be  adopted  to  this  problem,  however 
this  remains  an  open  question.  For  example,  one  eould  imagine  designing  pulse  ehase  experiments 
whieh  maximally  distinguish  between  competing  allosterie  models,  similar  to  the  earlier  work  of 
Kremling  et  aZ.  [52],  or  iteratively  estimating  model  struetures  similar  to  Doyle  and  eoworkers  [53]. 
Lastly,  the  ehoiee  of  max/min  integration  rules  or  the  partieular  form  of  the  transfer  functions  could  be 
generalized  to  inelude  other  rule  types  and  functions.  Theoretically,  an  integration  rule  is  a  function 
whose  domain  is  a  set  of  transfer  funetion  inputs,  and  whose  range  is  u  G  [0, 1].  Thus,  integration  rules 
other  than  max/min  eould  be  used,  sueh  as  the  mean  or  the  produet,  assuming  the  range  of  the  transfer 
funetions  is  always  /  G  [0,1].  Alternative  integration  rules  sueh  as  the  mean  might  have  different 
properties  whieh  could  influence  model  identifieation  or  performance.  For  example,  a  mean  integration 
rule  would  be  differentiable,  which  allows  derivative-based  optimization  approaehes  to  be  used. 
The  particular  form  of  the  transfer  funetion  eould  also  be  explored.  We  choose  a  Hill-like  function 
because  of  its  prominenee  in  the  systems  and  synthetie  biology  eommunity.  However,  the  only 
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mathematical  requirement  for  a  transfer  function  is  that  it  map  a  non-negative  continuous  or  categorical 
variable  into  the  range  /  G  [0, 1].  Thus,  many  types  of  transfer  functions  are  possible. 

4.  Materials  and  Methods 


4.1.  Formulation  and  Solution  of  the  Model  Equations 


We  used  ordinary  differential  equations  (ODEs)  to  model  the  time  evolution  of  metabolite  {Xi)  and 
scaled  enzyme  abundance  (Cj)  in  hypothetical  cell-free  metabolic  networks: 


dr-  ^ 

=  X]  (x,  e,  k)  i  =  l,2, 
i=i 
de,- 

—  =  -XiCi  i  =  l,2,...,£ 

dt 


(1) 

(2) 


where  TZ  denotes  the  number  of  reactions,  Ai  denotes  the  number  of  metabolites  and  8  denotes  the 
number  of  enzymes  in  the  model.  The  quantity  rj  (x,  e,  k)  denotes  the  rate  of  reaction  j.  Typically, 
reaction  j  is  a  non-linear  function  of  metabolite  and  enzyme  abundance,  as  well  as  unknown  kinetic 
parameters  k  (/C  x  1).  The  quantity  aij  denotes  the  stoichiometric  coefficient  for  species  i  in  reaction  j. 
If  aij  >  0,  metabolite  i  is  produced  by  reaction  j.  Conversely,  if  atj  <  0,  metabolite  i  is  consumed  by 
reaction  j,  while  a^j  =  0  indicates  metabolite  i  is  not  connected  with  reaction  j.  Lastly,  A*  denotes  the 
scaled  enzyme  degradation  constant.  The  system  material  balances  were  subject  to  the  initial  conditions 
X  (to)  =  Xo  and  e  (to)  =  1  (initially  we  have  100%  cell-free  enzyme  abundance). 

Each  reaction  rate  was  written  as  the  product  of  two  terms,  a  kinetic  term  (fj)  and  a  regulatory 
term  (vj): 

Tj  (x,  e,  k)  =  fjVj  (3) 


We  used  multiple  saturation  kinetics  to  model  the  reaction  term  rj: 
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where  denotes  the  maximum  rate  for  reaction  j,  denotes  the  scaled  enzyme  activity  which 
catalyzes  reaction  j,  and  Kjg  denotes  the  saturation  constant  for  species  s  in  reaction  j.  The  product 
in  Equaion  (4)  was  carried  out  over  the  set  of  reactants  for  reaction  j  (denoted  as  rn~). 

The  allosteric  regulation  term  Vj  depended  upon  the  combination  of  factors  which  influenced  the 
activity  of  enzyme  i.  Eor  each  enzyme,  we  used  a  rule-based  approach  to  select  from  competing  control 
factors  (Eigure  2).  If  an  enzyme  was  activated  by  m  metabolites,  we  modeled  this  activation  as: 


Vj  =  max  (fij  (Z) ,...,  fmj  (Z))  (5) 

where  0  <  fj  (Z)  <  1  was  a  regulatory  transfer  function  that  calculated  the  influence  of  metabolite  i  on 
the  activity  of  enzyme  j.  Conversely,  if  enzyme  activity  was  inhibited  by  a  m  metabolites,  we  modeling 
this  inhibition  as: 


Vj  =  1-  max  (fij  (Z),...,  f^j  (Z)) 


(6) 
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Lastly,  if  an  enzyme  had  both  m  aetivating  and  n  inhibitory  faetors,  we  modeled  the  regulatory 
term  as: 

Vj  =  min  {uj,dj)  (7) 

where: 


Uj  =  max  {fij  {Z) fmj  {Z))  (8) 

i+ 

dj  =  1  -  max  {fij  {Z) , . . . ,  fnj  {Z))  (9) 


The  quantities  j~^  and  j~  denoted  the  sets  of  aetivating  and  inhibitory  faetors  for  enzyme  j. 
If  an  enzyme  had  no  allosterie  faetors,  we  set  Vj  =  1.  There  are  many  possible  funetional  forms  for 
0  <  fij  (Z)  <  1.  However,  in  this  study,  eaeh  individual  transfer  funetion  took  the  form: 


1  + 


(10) 


where  Zj  denotes  the  abundanee  of  the  j  faetor  (e.g.,  metabolite  abundanee),  and  Kij  and  r]  are  eontrol 
parameters.  The  Kij  parameter  represents  a  speeies  gain  parameter,  while  r;  is  a  eooperativity  parameter 
(similar  to  a  Hill  coeffieient).  In  the  case  rj  >  1,  the  allosteric  interaction  displays  positive  eooperativity. 
For  ?7  <  1,  the  interaction  is  negatively  cooperative.  Finally,  if  rj  =  1,  the  interaction  displays  no 
eooperativity.  The  effect  of  different  values  of  rj  on  reaction  rate  can  be  seen  in  Figure  3.  The  model 
equations  were  encoded  using  the  Octave  programming  language  and  solved  using  the  LSODE  routine 
in  Octave  [54].  In  some  cases,  metabolic  fluxes  (or  other  quantities)  were  scaled  according  to: 


fj  {t  =  r) 


Tj  —  mm  r 
max  r  —  min  r 


t=T 


(11) 


where  0  <  f  j  (f  =  r)  <  1  denotes  the  scaled  value  for  flux  j  evaluated  at  time  r.  We  have  used  this 
scaling  in  a  variety  of  other  contexts  [45,55]. 


Estimation  of  model  parameters  and  structures  from  synthetic  experimental  data.  Model 
parameters  were  estimated  by  minimizing  the  difference  between  simulations  and  synthetic  experimental 
data  (squared  residual): 


min 

k 


(12) 


where  xj  (r)  denotes  the  measured  value  of  species  j  at  time  r,  Xj  (r,  k)  denotes  the  simulated  value 
for  species  j  at  time  r,  and  ojj  (r)  denotes  the  experimental  measurement  variance  for  species  j  at 
time  r.  The  outer  summation  is  respect  to  time,  while  the  inner  summation  is  with  respect  to  state. 
We  approximated  a  realistic  model  identification  scenario,  assuming  noisy  experimental  data,  limited 
sampling  resolution  (approximately  20  min  per  sample)  and  a  limited  number  of  measurable  metabolites. 
We  assumed  a  constant  coefficient  of  variation  of  10%  for  the  synthetic  data  set. 

We  minimized  the  model  residual  using  particle  swarm  optimization  (PSO)  [56].  PSO  uses  a 
swarming  metaheuristic  to  explore  parameter  spaces.  A  strength  of  PSO  is  its  ability  to  find  the  global 
minimum,  even  in  the  presence  of  potentially  many  local  minima,  by  communicating  the  local  error 
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landscape  experienced  by  each  particle  collectively  to  the  swarm.  Thus,  PSO  acts  both  as  a  local  and  a 
global  search  algorithm.  For  each  iteration,  particles  in  the  swarm  compute  their  local  error  by  evaluating 
the  model  equations  using  their  specific  parameter  vector  realization.  From  each  of  these  local  points, 
a  globally  best  error  is  identified.  Both  the  local  and  global  error  are  then  used  to  update  the  parameter 
estimates  of  each  particle  using  the  rules: 

+  ^2^1  {Ci  —  kj)  +  03r2  {Q  —  kj)  (13) 

k,  =  ki  +  Ai  (14) 


where  Aj  denotes  the  perturbation  to  the  vector  of  parameters  k*  for  particle  i.  (6*i,  6*2,  6*3)  are  adjustable 
parameters,  Ci  denotes  the  best  local  solution  found  by  particle  i,  and  Q  denotes  the  best  solution  found 
over  the  entire  population  of  particles.  The  quantities  ri  and  r2  denote  uniform  random  vectors  with 
the  same  dimension  as  the  number  of  unknown  model  parameters  (/C  x  1).  In  this  study,  we  used 
(6*1, 6*2,  6*3)  =  (1.0,0.05564,0.02886).  The  quality  of  parameter  estimates  was  measured  using  two 
criteria,  goodness  of  fit  (model  residual)  and  angle  between  the  estimated  parameter  vector  k^  and  the 
true  parameter  set  k* : 


ttj  =  cos 


-1 


k,  ■  k* 


(15) 


If  the  candidate  parameter  set  k^  were  perfect,  the  residual  between  the  model  and  synthetic  data  and 
the  angle  between  kj  and  the  true  parameter  set  k*  would  be  equal  to  zero. 

We  modified  our  PSO  implementation  to  simultaneously  search  over  kinetic  parameters  and  putative 
model  control  structures.  In  the  combined  case,  each  particle  potentially  carried  a  different  model 
realization  in  addition  to  a  different  kinetic  parameter  vector.  We  kept  the  update  rules  the  same  (along 
with  the  update  parameters).  Thus,  each  particle  competed  on  the  basis  of  goodness  of  fit,  which  allowed 
different  model  structures  to  contribute  to  the  overall  behavior  of  the  swarm.  We  considered  five  possible 
model  structures  (A  through  E),  where  network  A  was  the  correct  formulation  (used  to  generate  the 
synthetic  data).  We  considered  a  population  of  100  particles,  where  each  particle  in  the  swarm  was 
assigned  a  model  structure,  and  a  random  parameter  vector.  The  optimization  simulations  shown  in 
Figure  7  required  several  hours  to  complete  on  a  single  CPU  Apple  workstation  (Apple,  Cupertino,  CA, 
USA;  OS  X  vlO.lO).  The  PSO  algorithm,  model  equations,  and  the  objective  function  were  encoded  and 
solved  in  the  Octave  programming  language  [54]. 
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